Consensus-Based Information Filtering in Distributed LiDAR Sensor Network for Tracking Mobile Robots

A distributed state observer is designed for state estimation and tracking of mobile robots amidst dynamic environments and occlusions within distributed LiDAR sensor networks. The proposed novel framework enhances three-dimensional bounding box detection and tracking utilizing a consensus-based information filter and a region of interest for state estimation of mobile robots. The framework enables the identification of the input to the dynamic process using remote sensing, enhancing the state prediction accuracy for low-visibility and occlusion scenarios in dynamic scenes. Experimental evaluations in indoor settings confirm the effectiveness of the framework in terms of accuracy and computational efficiency. These results highlight the benefit of integrating stationary LiDAR sensors’ state estimates into a switching consensus information filter to enhance the reliability of tracking and to reduce estimation error in the sense of mean square and covariance.


Introduction
The evolution of distributed sensor networks (DSNs) and remote sensing has led to their application in motion planning and for the control of mobile autonomous systems for surveillance, environmental monitoring, warehouse management, and intelligent transportation [1][2][3][4][5][6].
Comprising spatially distributed sensors installed on infrastructures or mobile robots, DSNs can process local environmental data and generate a comprehensive scene overview through centralized or distributed state estimation [7][8][9][10][11], while maintaining asymptotic stability of the estimation error dynamics.As beneficial as these networks may be, the challenge is reliable estimation/sensing at the node (e.g., visual inference, point cloud clustering), particularly when it comes to target tracking in dynamic environments subject to partial or full occlusion [12][13][14][15].Additionally, emerging research suggests that radar-based and UWB (Ultra-Wideband)-based distributed sensor networks offer promising alternatives for the localization and state estimation of mobile robots/vehicles [16,17], potentially enhancing accuracy and robustness in environments where LiDAR may encounter limitations.
The complexity and time-sensitivity of visual and LiDAR-based tracking of moving targets demand distributed observer design using continuous communication or eventtriggered architecture [18][19][20][21][22], as accurate detection and pose estimation are integral for the robust and scalable tracking which is required for distributed motion planning and control of multiple mobile robots.In this regard, computationally efficient LiDAR-based state estimation at the edge (i.e., stationary sensing units), which provides accurate depth measurements provided by point clouds (removing the complexity of depth estimation and disparity map generation with a stereo/monocular camera), and maintains privacy (due to not processing of any images in the scene), has been the focus of the recent literature on remote estimation in DSNs [23][24][25][26].
However, processing the LiDAR point cloud data to obtain robots' three-dimensional bounding boxes, including their headings, and removing outliers are challenging and have been addressed through geometrical (i.e., model-based), filter-based, and end-to-end learning methods [27][28][29].Despite this, the inherent variability in mobile robots'/ground vehicles' sizes and occlusion in dynamic scenes are the main challenges for reliability in detection and heading estimation in both model-and learning-based clustering and pose estimation approaches [30][31][32][33].After reliable inference and pose estimation at the sensor node level, distributed estimation based on the Kalman consensus filter provides accurate estimates and convergence [34,35], depending on the topology of the networked sensors, guaranteeing the optimality of the variance and the stability of the error dynamics.Existing estimation methods in sensor networks, which centralize processing or use micro Kalman filters at each sensor node, impose considerable computational loads and require extensive inter-node communication during tracking and state estimation.These challenges are exacerbated in dynamic environments with multiple mobile robots and occlusion cases, impacting scalability and efficiency.Additionally, conventional L-shape fitting techniques for LiDAR-based tracking of mobile robots/vehicles are restricted to 2D data and necessitate sensors at the same vehicle height, limiting sensor placement flexibility and adaptability.
This paper develops a computationally efficient remote state estimation using a distributed architecture over LiDAR sensor networks, to address the limitations mentioned above and occlusion and limited visibility for the detection and tracking of mobile robots, through two key contributions:

•
A generic LiDAR-based 3D bounding box detection and tracking method is designed to accommodate a wider range of sensor locations within DSNs; • A distributed switching observer is designed to handle dynamic and occluded scenarios, to reduce overall computational cost, and to introduce short-term predictive capacity.
The remainder of the paper is organized as follows: Section 2 presents the background and point cloud processing.Section 3 introduces the designed distributed observer.Section 4 evaluates the performance and computational efficiency of the developed remote sensing framework in several experiments including occluded and dynamic scenes.Section 5 concludes this paper.

Background and Point Cloud Processing
The position vector p r of a mobile robot centroid and the orientation o r of the bounding box of the tracked robot are defined by where x r , y r , and z r represent the coordinates of the centroid in the fixed world frame {G}, while ϕ r , ψ r , and θ r are the bounding box (roll, pitch, and yaw) orientation angles.In order to develop the state observer and point cloud clustering, the following assumptions are made in this paper: (i) any variations in the z-value of the robot position are negligible (i.e., our case study is for the pose estimation and tracking of wheeled robots as shown in Figure 1); and (ii) the position p s and orientation o s of the sensor nodes are known in the global frame {G}.There is also no prior knowledge of the environment and dimensions of the robot.The computational challenge posed by high-dimensional point clouds is mitigated through voxelization (for dimensionality reduction).Given a point cloud C ∈ R N×3 , where N is the number of points, each represented by (x, y, z), the application of the Voxel Grid Filter can be represented as in which α ∈ R is the parameter controlling the size of the voxels, and is the resulting filtered point cloud with N v < N points.This divides the point cloud into a three-dimensional grid and selects a single representative point from each grid cell, discarding the rest.Each voxel v i is a cubic region of volume α 3 , and each point p ∈ C is assigned to a voxel based on its coordinates, determined by (i x , i y , i z ) = x α , y α , z α , where (i x , i y , i z ) are the voxel indices in the grid corresponding to each point p, defining the voxel v i to which the point is assigned.All points p within each voxel v i are then represented by the centroid c i , which is the arithmetic mean of the points in the voxel calculated by c i = 1 N (v i ) ∑ p∈v i p, where N (v i ) is the number of points in the voxel.This approach effectively reduces the data volume while preserving the essential structural features of the point cloud.The point cloud data (obtained from the solid-state Robosense LiDAR shown in Figure 1) require transformation for alignment with the global frame {G}.The transformation of all points p ∈ C v involves rotation performed using a rotation matrix R ∈ R 3×3 derived from the Euler angles where R x (ϕ s ), R y (ψ s ), and R z (θ s ) represent the rotation matrices around the x-, y-, and z-axes, respectively.The point cloud in the global frame is then expressed by in which t = x s y s z s ⊤ is the translation vector.Considering the first assumption, we employ a thresholding technique to identify and eliminate ground points from the point cloud as in which C ′ ∈ R N ′ ×3 includes the ground-less point cloud, h f is the elevation of the ground floor in the frame G, and ϵ is a slack variable to deal with potential noise or deviations from the ideal floor plane.While this approach effectively removes the majority of the ground points, it may also remove low-elevation non-ground points.Since the geometry of other operating robots in the environment is known and the aim of this research is accurate clustering and reliable distributed robot state estimation (not generating a cost/occupancy map), even the unexpected exclusion of some low-elevation points due to thresholding will not significantly affect the performance of the proposed framework.

Clustering and Principle Component Analysis
To partition the point cloud into different groups, Euclidean clustering is employed.Given the point cloud C ′ , Euclidean clustering involves finding the set of clusters C c = {C 1 , C 2 , . . ., C k } that best represents the underlying structure of points p ∈ C ′ , where each cluster C i is defined as a set of neighboring points that are close to each other in the 3D space, as in where i = {1, . . ., k} denotes the cluster index, j = {1, . . ., N ′ } represents individual points within the point cloud, and ρ is a predefined distance threshold.It is important to note that the value of ρ must be at least as large as the leaf α of the voxels.If ρ ≤ α, no clusters will be formed due to the inability of points to meet the proximity criteria.Each cluster C i is characterized by computing the spans between the maximum and minimum values for height and length along the xand y-axes.The selection of the clusters corresponding to the robot is achieved by comparing these dimensions to the prior geometry/size information of the robot operating in the environment.This comparison specifically considers the robot's shortest side and its longest diagonal to ensure the dimensions fall within predefined thresholds.By comparing the current clusters with the previous estimates, we associate each selected cluster (mentioned above) with the corresponding object through maximum likelihood data association.For each cluster C i ∈ C c , the association likelihood is calculated by evaluating the distance d i between the centroid of the cluster and the prior estimated position p r .The cluster with the centroid closest to the past position estimate has the highest likelihood considering the bounds on the robot's speed and acceleration.The chosen association is where L i is maximal, as in The aim is to fit a 3D bounding box around each cluster of points obtained, accurately representing the shape and dimensions of the robot in the scene.The proposed method modifies the conventional L-shape fitting [36,37], which is effective when the robot's point cloud consists of points belonging to the primary two edges of the moving robot.The LiDAR's bird's-eye-view estimated pose reveals the robot's top in the point cloud.
To isolate the robot's edges, the cluster C i is projected onto the xy-plane and encapsulated by a convex hull, CH(C i ), which is the smallest convex polygon encompassing all points in cluster C i as shown in Figure 2.This method both identifies the mobile robot's edges and regularizes its shape.If the robot shape is not a complete box (due to protruding wheels or a manipulator), the convex hull algorithm normalizes the estimated cluster edges into rectangles.Let p h denote each point on the convex hull, where h = {1, 2, . . ., N}, and N is the total number of points constituting the convex hull CH(C i ).To classify the remaining points, we analyze the angle β h for each point p h ∈ CH(C i ) with respect to the infrastructure-mounted LiDAR position p s .By finding the lower bound x h ), the corresponding point p β min can be identified.Similarly, by finding the upper bound, β max = max{β h }, the point p β max can be determined.Hence, the points p β min and p β max correspond to the two primary edges of the mobile robot, as they represent the most laterally external points within the point cloud based on their angular positions.To classify whether each p h ∈ CH(C i ) belongs to the main edges, we examine the relative position of each p h with respect to the LiDAR position p s and the line segment defined by the two points p β min and p β max .Initially, the vector v that links the two points is computed by v = p β max − p β min .Subsequently, the scalar product w j of the vector v and the vector from the sensor position p s to each point p j ∈ C i is defined by w j = v • (p j − p s ).As schematically represented in Figure 3a, the scalar dot product w j is then used as an indicator of the position of p j with respect to p s .

Figure 3. (a)
Lateral-external points and L-shape reduction: (i) if w j ≥ 0, the point p j lies on the same side of the decision boundary (dashed blue line) as the sensor, or is collinear with the line segment defined by p β min and p β max (it is considered to be part of the main edges and is retained), and (ii) if w j < 0, the point p j is removed from C i ; (b) closest point and RANSAC line fitting.
The nearest corner point p n , which corresponds to the point with the minimum Euclidean distance between all points in the target robot and the sensor position p s , is identified by The cluster C i is then divided into two distinct subsets, namely C A i and C B i , based on the angle each point forms with respect to the corner point p n .The angle γ j between point p j and p n is computed by γ j = tan −1 y j −y n x j −x n .Consequently, the points with γ j ≤ π/2 are included in the set C A i , while the remaining points are included in the set C B i , as in The lines l A i and l B i are then fitted to the point sets C A i and C B i using Random Sample Consensus (RANSAC) as shown in Figure 3b.The orientation difference between the two lines is denoted by where θ A i and θ B i are the orientations of lines l A i and l B i , respectively.If the difference δθ exceeds a predetermined threshold δθ, the lines l A i and l B i are accepted as the two edges of the target mobile robot.Consequently, the orientation of the robot, θ r , is determined by the longer of the two fitted lines, l A i or l B i .The scale of the bounding box is then determined by If the angular difference δθ is within the threshold δθ, it suggests that points align along a single robot edge.This results in inaccurate bounding box identification using conventional L-shape detection methods.Afterwards, in the augmented LiDAR-based state clustering module (of the proposed estimation framework) in this paper, principal component analysis (PCA) is used to identify the bounding box's major axis based on the first principal component of the point cloud transformed to a new coordinate system; the second and third components denote the bounding box minor axes.This transformation is achieved through the eigenvectors and eigenvalues of the cluster C i covariance matrix using W i ∈ R 3×3 and ) and V(•) denotes the eigenvector matrix of (•).The orientation θ r of the robot in the new coordinates is given by the angle of the first principal component, , where W n,m denotes the element in the n-th row and m-th column of the eigenvector matrix W i .The length l r and width w r of the bounding box correspond to lengths of the bounding box along the xand y-axes in the new coordinate system, respectively, and are determined as in which X PCA,1 and X PCA,2 denote the column number of X PCA .

Distributed Estimator Design
To achieve consistent robot pose estimates and reliable dynamic object tracking, a distributed estimation framework is designed in this section using a consensus filter.The designed consensus filter utilizes covariance dual-rate self-tuning, and provides a computationally efficient solution through the integration of short-term prediction and resource allocation in the distributed LiDAR sensor network for the estimation of robots' states.The state estimation is conducted for each detected robot, allowing for the simultaneous tracking of robots' states in dynamic environments.The architecture of the distributed state observer is provided in Figure 4 for each network node.In this regard, the state variable vector of the dynamical system in discrete time is defined as where x r (k), y r (k), and θ r (k) are the translational position of the robot's centroid and its yaw angle (about the z-axis) in the world frame {G} at time k ∈ {0, 1, 2, . . .}, while ẋr (k), ẏr (k), and θr (k) are the translational velocities of the robot centroid and its yaw rate, respectively.The dynamic process in discrete time, which is used for the remote state estimation alongside the depth measurement in the distributed observer, is where ϑ(k) is the bounded noise of the process.This motion model is also used to address occluded scenes and intermittent cluster identification.The assumption of the constant acceleration is valid between two consecutive frames (within the sample time T s = 50 ms, which is sufficiently small for the robot maximum speed of v x = 1.2 m/s to warrant this approximation) and the acceleration is updated at every time step using a moving horizon forecasting method described at the end of this subsection.The state and input matrices then yield in which T s is the sample time between two consecutive frames, and I is the identity matrix.
The sensor model of the stationary sensing unit i to track the state of the target/robot within the LiDAR sensor network with communication topology G = (V, E ) is where w(k) is the measurement noise of the sensor i and H i (k) is the observation matrix of the sensor i. Process and measurement noises within the distributed sensor setting are assumed to be uncorrelated, i.e., E ϑ k w ⊤ k = 0.The input u(k) = [a x (k), a y (k), θr (k)] ⊤ has the perceived longitudinal/lateral translational and rotational accelerations of the clustered point cloud centroid, and is obtained over a moving horizon using position and speed gradients.In this regard, the acceleration input is estimated remotely using a moving horizon n h ∈ N over past estimated states, as in u(k) ≜ E{a k }, where where the finite difference of ṗr,k = [v r,x , v r,y , ψk ] ⊤ over sample time T s is used to approximate the time derivative d dt ṗr,k .For ease of notation, the updated values of the state variable and information matrices, which will be introduced in the next subsection, are denoted by x + , and subscripts k are dropped.As a result, the process dynamics (15) and the sensor model of the stationary sensing unit i of the network are written as The motion model dynamically incorporates changes in the system response over time by continuously updating the control inputs for the consensus filter, and enables reliable tracking of the robot in the designed consensus filter in case of occlusion in dynamic environments.

Consensus Information Filter
The observer tracks the robot over the distributed sensor network, and the aim is for the network to provide a set of state estimates xi (k) of the robot through the local exchange of messages among close/neighboring sensing units.At each time instance k, the state observer allocates the resource to only one stationary sensing unit denoted by the base node B(k) for state estimation in the global frame using a consensus information filter, in which the measurement y i (with covariance R i ) of the stationary sensing unit s i is used to calculate the information vector z i = H ⊤ i R −1 i y i and the information matrix If the stationary sensing unit s i , which has the robot in its field of view, matches the base node allocation B(k) for time k, it checks for any available measurements y j (with covariances R j ) from the following set: where N i denotes the neighbors of node i on the network (with topology G(k), which could be a dynamic/switching graph).If found, the sensing unit i receives the pair from the other sensing unit j ∈ S i , and transforms them into the following information form: Then, the following distributed observer scalable in n (developed in [35,38]) is used to estimate the robot pose with xi as the set of prior estimates (predictions) of the state x(k): in which L i = (P −1 i + Īi ) −1 is the consensus information filter gain, x+ i is the state update of the local information filter, and ξ = ι 1+∥P i ∥ .The small constant ι > 0 is chosen in the order of the (integration) time interval T s used for discretization of the continuous-time constant-acceleration motion model.∥P i ∥ = tr(P ⊤ i P i ) denotes the Frobenius norm of the matrix P i .It should be mentioned that the distributed state observer in ( 22) is an information filter form of the Kalman consensus filter provided in the following: with the update of the prior estimates x+ i = Ax i + Bu i , and the Kalman gain , where I is the identity matrix with proper dimensions.
Remark 1.The (global) asymptotic stability of the error dynamics of the Kalman consensus filter (23) with the choice of consensus gain η i = ξS i is proved in [35], Theorem 2, where all sensor node estimators asymptotically reach a consensus In the case where no measurement is available through stationary sensors (i.e., S = ∅), the state estimate is the predicted pose by the motion model (15), with u(k) as the behaviourbased input obtained over a horizon n h using (18).The base node B(k) for the next time step is selected based on a distance function d(s j ) that measures the Euclidean distance between the estimated state and the positions of the stationary sensing units: where x(1:2) j ≜ [ xr,j (k), ŷr,j (k)] ⊤ includes the first two elements of the estimated state xj in the 2D plane at time k.Then, the sensor node closest to the predicted state is chosen as the next allocated resource B(k + 1) ≜ s j , where j = arg min d(s j ).The stationary sensor s i broadcasts the updated and predicted state estimates, the updated covariance, and the base node B(k + 1) for the next time step.The decision to allocate resources based primarily on spatial distance arises from the inherent spatial distribution observed among both the robot and the sensor nodes in the scene.This spatial distribution emerges as a result of factors such as the physical dimensions of robots and environmental constraints in dynamic scenes, which collectively ensure a dispersion of robots across the operational environment.
If sensor node s i is not assigned as the base node B(k), it functions as a contributing node within the network.Non-base nodes with available measurements transmit the measurement vectors y j and the corresponding measurement covariance matrices R j to B(k).Independently of their measurement status, all nodes subsequently receive the most recent state estimates x, the prediction state x, the covariance P, and the identifier B k+1 of the next node scheduled for base status from B k .

Covariance Dual-Rate Self-Tuning
The need for a dual-rate tuning method is that the noise in the proposed distributed state observer (due to LiDAR-based sensing and point cloud clustering) is time-varying and changes depending on the distance of the target vehicle/robot from the stationary sensing unit (i.e., the solid-state LiDAR sensor).Therefore, two initial covariances are used for the information filter: Σ c , associated with d c , which is the closest Euclidean distance (for close-range measurements); and Σ f , associated with the farthest Euclidean distance d f from the sensing unit (for far-range measurements).The covariance allocation during the measurement update in the consensus filter is then a linear interpolation R k = a 1 Σ f + a 2 Σ c , using the measured distance d (obtained by clustering and l r , w r ) with d = d f − d c .

Experimental Results and Discussion
Several experiments were conducted in an indoor setting comprising various static and dynamic objects (with the operator walking around the robot) to evaluate the performance and computational efficiency of the developed LiDAR-based remote sensing and distributed estimation framework.The experimental methodology included testing in (i) full occlusion and (ii) partial occlusion in dynamic environments to evaluate the proposed distributed estimator's accuracy, convergence, and computational efficiency using one and two LiDAR remote sensing units.Furthermore, high-fidelity simulations using Matlab (2023b) Automated Driving Toolbox were conducted for the (i) state estimation of multiple skid-steer mobile robot models using a network of solid-state LiDAR sensors (as shown in Figure 5), and (ii) identification of optimal coverage areas.The global frame of reference {G}, which located its origin at one of the corners of the testing hall/room, was established for all test scenarios.The mobile test platform was a Clearpath's Husky skid-steer UGV, with a maximum speed of 1.2 m/s for the conducted indoor tests in dynamic environments.The experiments employed a dual solid-state LiDAR setup (with horizontal and vertical fields of view of 120 • and 25 • , respectively, a range of 150 m, a region of interest (ROI) feature for better resolution around fields of view limits, and accurate line separation, i.e., 3-4 cm detection range), each paired with an embedded Jetson Xavier NX.The first LiDAR was oriented at a pitch of 25 • and a yaw of −90 • ; the second LiDAR had a pitch of 15 • and a yaw of 170 • degrees.This arrangement ensured comprehensive coverage of the area.The experimental methodology was designed to investigate two distinct scenarios (mentioned above), each tested five times in dynamic scenes under similar nominal control inputs to the UGV during navigation with obstacle avoidance.Throughout all testing scenarios, the UGV performed dynamic motion planning, as well as followed a consistent navigational path, and operated without any communication with the distributed sensor network.
The qualitative/quantitative experimental results are provided in the following subsections for two main scenarios: with only one stationary LiDAR sensor (to evaluate the clustering, tracking, and pose estimation), and with two remote sensing units.In this section, (i) "KCF-L" denotes the Kalman consensus filter architecture [35,39] which utilizes the proposed point cloud processing and pose estimation from Section 2; and (ii) "Obs."denotes the developed consensus information filter, including point cloud processing and LiDAR-based distributed pose estimation (i.e., the whole pipeline proposed in this paper).

Occlusion with One Remote Sensing Unit
In the first test scenario, only LiDAR 1 was used for state estimation.The estimation results for this scenario are graphically illustrated in Figure 6.The results are consistent throughout the entire set of runs; therefore, only one run is displayed in the figure.Subsequently, the quantitative data presented are mean values computed from all runs.In the event of measurement unavailability, both the Kalman consensus filter (KCF-L) and the designed distributed observer use the predicted state as the estimation until the updated measurements for the subsequent step.However, KCF-L uses a general dynamics model without input estimation, resulting in constant velocities throughout the occluded area.Contrarily, the proposed framework estimates the input acceleration (for the dynamical process) using a moving horizon and the velocity magnitude right before the occlusion instance.This allows the designed framework to make accurate predictions of changes in the robot heading angle, thus reducing the estimation error after being updated by measurements from LiDAR 1.The point cloud representation of the scene partially occluded by a moving obstacle (i.e., operator) is shown in Figure 7, which confirms consistent clustering and state estimation using one remote sensing unit.
Position and orientation estimation errors are in Table 1, where MAE, MSE, and RMSE denote mean absolute error, mean squared error, and root mean square error, respectively.These values are mean quantitative results derived from the entire set of runs, which were five consecutive tests in the environment with dynamic objects and similar nominal control inputs (for the purpose of obstacle avoidance.The experimental studies confirm that as long as the robot trajectory does not change drastically between sample times for occluded scenes, the constant-acceleration motion model can be used for full occlusion.
Furthermore, it is important to note that the original bounding box L-shape fitting method fails to function under bird's-eye-view conditions.This method was designed for sensors positioned at the same height as the vehicle; it effectively utilizes angled views to estimate dimensions and orientation.

Two Sensor Nodes
In Scenario 2, the performance of the developed framework is evaluated using two sensor nodes.Initially, the environment does not present any significant occluding obstacles.Subsequently, the presence of the moving operator in Areas 1 and 2 leads to the formation of two occluded areas, discernible in Figure 8 between points A 1 and A 2 and between B 1 and B 2 .
For occlusion A instances, the alteration is not significant as the robot maintains a straight path throughout the occlusion.In contrast, occlusion B presents more noticeable differences as the robot follows a curve and transitions into the second area, necessitating a shift in the sensors' field of view (from L 1 to L 2 ).Although occlusion B incites an increase in the absolute error, the maximum estimation error by the proposed method does not exceed 11 cm for the position estimation, as shown in Figure 8.
The position and heading angle estimation results are compared in Table 2, which highlights improvements in the MSE, MAE, and error covariance using the proposed framework.Even though the performances of both state observers are very close for this scenario, under full robot visibility, Obs.demonstrates superior performance under pedestrian occlusion.This is attributed to the input estimation, which is a feature not employed by the KCF-L.Several experimental tests and error convergence analyses in various high-/lowexcitation maneuvers with occlusion provide insightful observation on the impact of perceptually degraded conditions on the performance of the designed LiDAR-based distributed switching state observer in this paper: (i) When the robot is fully visible by SSUs, the developed observer and the KCF-L (which uses the developed computationally efficient clustering approach, but utilizes the conventional Kalman consensus filter) perform reliably for position estimation; however, less heading estimation error for "Obs."offers an advantage in terms of computational cost on a network level, making it a more efficient choice in scenarios where computational resources are a limiting factor.(ii) The variance in the estimation error by the proposed framework is lower than that of KCF-L in almost all experiments.This is attributed to the utilization of the region of interest for outlier identification/rejection in the new distributed estimation framework.
The solid-state LiDAR used for experiments features accurate line separation at a 3-4 cm detection range, generating accurate point clouds with high frequencies of 50 Hz in indoor settings.The proposed distributed estimation framework enables processing and publishing the pose estimates with a sample time T s = 50 ms.Considering the maximum robot speed of v x = 1.2 m/s, the maximum range of 70 m for indoor operation, and the sampling time of 50 ms (including the process required for clustering/estimation), the utilized LiDAR enables accurate estimation (RMSE < 13 cm even with occlusion) with various robot speeds.For other outdoor settings with higher vehicle speeds, e.g., intelligent transportation, the correlation between the speed and accuracy of LiDAR point clouds is an important aspect and needs to be taken into account.
The computational times of the proposed distributed state observer are shown in Figure 9 for both scenarios (with occlusions) using a 6-core NVIDIA Carmel ARM 1.9 GHz with 16 GB memory (available through Jetson Xavier NX embedded systems for edge computing purposes), and confirm the computational efficiency of the framework, i.e., <100 ms, which is sufficient for the accurate motion planning of mobile robots with maximum speeds of 2 m/s, and is important for scalability in larger sensor networks.

Conclusions
In this paper, a remote LiDAR-based state estimation (and tracking) framework is developed and experimentally validated for mobile robots in distributed sensor networks by designing a consensus information filter and computationally efficient point cloud processing.As confirmed by experiments, the proposed approach hinges on two primary contributions: a LiDAR-based 3D bounding box detection and tracking method that expands the versatility of sensor placements in distributed sensor networks, and a node-switching distributed observer to address occlusion and uncertainties in dynamic environments, achieving average estimation errors as low as 2 cm and 2 • under nominal conditions (<4 cm and <3.5 • under occlusions).These advancements significantly reduce computational demands (i.e., less than 100 ms) while incorporating short-term motion predictive capabilities, thereby enhancing the scalability and efficiency of mobile robot tracking in dynamic scenes with human presence while maintaining privacy.Through experimental evaluations and high-fidelity simulations in indoor settings, the effectiveness of the framework in terms of the accuracy of the estimation and the asymptotic stability of the error dynamics is demonstrated in normal and arduous scenarios (including occlusion, limited visibility, and dynamic obstacles) and compared with the Kalman consensus filter.Future work will focus on further refining the predictive modeling in large sensor networks and exploring its applications in unstructured operational settings with multi-robot tracking.Additionally, research will aim to move beyond the assumption of flat terrain and negligible z-axis variations in robot positions to handle more complex and varied terrain for field applications with stationary sensing units.

Figure 1 .
Figure 1.Experimental setup for evaluation of the LiDAR-based distributed state observer: unmanned ground vehicle (UGV) setup (left), and solid-state LiDAR used for remote sensing (right).

Figure 2 .
Figure 2. The xy-projection of the UGV robot's clustered point cloud in real time for z s ≤ z r (left), for z s > z r (center), and the convex hull of the cluster xy-projection (right).

Figure 4 .
Figure 4. Overview of the distributed state observer on each node s i in the sensor network.

Figure 5 .
Figure 5. High-fidelity simulation of LiDAR coverage in indoor setting.

Figure 4 :
Figure 4: Caption for second figure

Figure 6 .
Figure 6.Tracked position and its absolute error over time in Scenario 1, where S is the starting point for the trajectory.

Figure 7 .
Figure 7. Point cloud clustering in (a) a dynamic scene with moving objects, (b) partial occlusion, and (c) full occlusion by an operator moving alongside the robot (in blue).

Figure 4 : Caption for second figure 1 Figure 8 .
Figure 4: Caption for second figure

1 Figure 9 .
Figure 9. Computational time of the proposed remote sensing framework for Scenario 1 (top) and Scenario 2 (bottom) subject to occlusion and dynamic objects moving around the robot.

Table 2 .
Position/heading estimation errors in Scenario 2 with 2 remote sensing units and occlusion scenarios.